Search and Ranking Algorithms for Locating Resources on the World Wide Web

نویسندگان

  • Budi Yuwono
  • Dik Lun Lee
چکیده

Applying information retrieval techniques to the World Wide Web (WWW) environment is a unique challenge, mostly because of its hypertext/hypermedia nature and the richness of the meta-information it provides. We present four keyword-based search and ranking algorithms for locating relevant WWW pages with respect to user queries. The rst algorithm, Boolean Spread Activation, extends the notion of word occurrence in Boolean retrieval model by propagating the occurrence of a query word in a page to other pages linked to it. The second algorithm, Most-cited, is based on the number of citing hyperlinks between potentially relevant WWW pages to increase the relevance scores of the referenced pages over the referencing pages. The third algorithm, TFxIDF or vector space model, is based on word distribution statistics. The last algorithm, Vector Spread Activation, combines vector space model and spread activation model. We conducted an experiment to evaluate the retrieval eeectiveness of these algorithms. From the results of the experiment, we draw conclusions regarding the nature of the WWW environment with respect to document ranking strategies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Ranking the Pages of the World Wide Web1

Query search engines are fundamental tools in locating documents satisfying to Web surfers’ interests. A Web search engine enumerates no more than few hundreds of documents for any key word search query. The quality of a search engine largely depends on its ranking algorithm, the heuristics applied for selecting the hit list from the pages containing the key word. This extended abstract discuss...

متن کامل

Popularity-Based Relevance Propagation

It is evident that information resources on the World Wide Web (WWW) are growing rapidly with unpredictable rate. Under these circumstances, web search engines help users to find useful information. Ranking the retrieved results is the main challenge of every search engine. There are some ranking algorithms based on content and connectivity such as BM25 and PageRank. Due to low precision of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996